Picture for Jiarui Zhang

Jiarui Zhang

Stein-Encoder: A White-Box Supervised Encoder via Stein Identities in Multi-Modal Studies

Add code
May 25, 2026
Viaarxiv icon

AgroTools: A Benchmark for Tool-Augmented Multimodal Agents in Agriculture

Add code
May 21, 2026
Viaarxiv icon

S^2tory: Story Spine Distillation for Movie Script Summarization

Add code
May 05, 2026
Viaarxiv icon

OVPD: A Virtual-Physical Fusion Testing Dataset of OnSite Auton-omous Driving Challenge

Add code
Apr 22, 2026
Viaarxiv icon

Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models

Add code
Apr 20, 2026
Viaarxiv icon

From Myopic Selection to Long-Horizon Awareness: Sequential LLM Routing for Multi-Turn Dialogue

Add code
Apr 14, 2026
Viaarxiv icon

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Add code
Mar 30, 2026
Viaarxiv icon

DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation

Add code
Mar 20, 2026
Viaarxiv icon

AgroNVILA: Perception-Reasoning Decoupling for Multi-view Agricultural Multimodal Large Language Models

Add code
Mar 15, 2026
Viaarxiv icon

MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Add code
Mar 10, 2026
Viaarxiv icon